AITopics | iteration head

Iteration Head: A Mechanistic Study of Chain-of-Thought

Neural Information Processing SystemsMar-22-2026, 10:14:25 GMT

Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power.However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited.This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting.In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined iteration heads.We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.

artificial intelligence, natural language, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.62)

Add feedback

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:02:23 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 16:04:59 GMT

parity problem, sequence, transformer, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Iteration Head: A Mechanistic Study of Chain-of-Thought

Neural Information Processing SystemsMay-27-2025, 15:47:26 GMT

Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power.However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited.This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting.In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads".We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks.

chain-of-thought, iteration head, mechanistic study, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

Iteration Head: A Mechanistic Study of Chain-of-Thought

Cabannes, Vivien, Arnal, Charles, Bouaziz, Wassim, Yang, Alice, Charton, Francois, Kempe, Julia

arXiv.org Artificial IntelligenceJun-4-2024

In the rapidly evolving field of artificial intelligence, Large Language Models (LLMs) have emerged as a pivotal component [45]. Their ability to understand, generate, and manipulate human language has opened up new avenues towards advanced machine intelligence. Interestingly, despite being primarily trained on next-token prediction tasks, LLMs are able to produce much more sophisticated answers when asked to generate steps of reasoning [30, 58]. This phenomenon, often referred to as Chain-of-Thought (CoT) reasoning, and illustrated on Table 1, appears paradoxical: on the one hand, LLMs are not explicitly programmed to reason; on the other hand, they are capable of following logical chains of thoughts to produce relatively complex answers. Table 1: Chain-of-Thought consists in eliciting reasoning steps before answering (A) a question (Q).

parity problem, sequence, transformer, (16 more...)

arXiv.org Artificial Intelligence

2406.02128

Genre: Research Report (1.00)

Technology: